Statistical models that explicitly account for errors in the measurement of predictor variables.
General Principles
Measurement error refers to the variability in the measurement of a variable, and measurement error can be generated by several factors, such as sampling bias, censoring bias, and group size heterogeneity. It is an important consideration in many fields, including statistics, economics, and engineering, where accurate measurements are crucial for making informed decisions. To account for measurement error, we can use a measurement error model. This model assumes that the measurement of a variable is subject to an error, which can be modeled using a probability distribution. The model can be used to estimate the parameters of the measurement error distribution, such as the mean and variance, and to make predictions about the measurements based on the estimated parameters. Measurement error models are composed models (i.e., models with sub-models) that evaluate different generative processes, starting with the measurement error process, which is then used to generate the observed data.
Example
Below is an example code snippet demonstrating a Bayesian measurement error model using the Bayesian Inference (BI) package. The data consist of three continuous variables (marriage rate, divorce rate, age), and the goal is to estimate the effect of age and marriage rate on the divorce rate while considering that the divorce rate has a measurement error. This example is based on McElreath (2018).
from BI import bi, jnp# Setup device------------------------------------------------m = bi(platform='cpu')# Import Data & Data Manipulation ------------------------------------------------# Importdata_path = m.load.WaffleDivorce(only_path=True)m.data(data_path, sep=';') m.scale(['MedianAgeMarriage', 'Marriage']) # Scaledat =dict( D_obs = m.z_score(m.df['Divorce'].values), D_sd = jnp.array(m.df['Divorce SE'].values / m.df['Divorce'].std()), A = jnp.array(m.df['MedianAgeMarriage'].values), M = jnp.array(m.df['Marriage'].values), N = m.df.shape[0] )m.data_on_model = dat # Send to model (convert to jax array)# Define model ------------------------------------------------def model(D_obs, D_sd, A, N, M): a = m.dist.normal(0, 0.2, name ='a') beta = m.dist.normal(0, 0.5, name ='beta') eta = m.dist.normal(0, 0.5, name ='eta') s = m.dist.exponential(1, name ='s') mu = a + beta * A + eta * M D_true = m.dist.normal(mu, s, name ='D_true') m.dist.normal(D_true , D_sd, obs = D_obs) # Run MCMC ------------------------------------------------m.fit(model) # Optimize model parameters through MCMC sampling# Summary ------------------------------------------------m.summary() # Get posterior distributions
library(BayesianInference)jnp = reticulate::import('jax.numpy')# Setup platform------------------------------------------------m=importBI(platform='cpu')# Import data ------------------------------------------------m$data(paste(system.file(package ="BayesianInference"),"/data/WaffleDivorce.csv", sep =''), sep=';')m$scale(list('MedianAgeMarriage', 'Marriage'))m$data_on_model$D_obs = m$z_score(jnp$array(m$df['Divorce']))m$data_on_model$D_sd = jnp$array(m$df['Divorce SE']) /sd(unlist(m$df['Divorce']))m$data_on_model$A = jnp$array(m$df['MedianAgeMarriage'])m$data_on_model$M = jnp$array(m$df['Marriage'])m$data_on_model$N =as.integer(nrow(m$df))# Define model ------------------------------------------------model <-function(D_obs, D_sd, A, N, M){ a =bi.dist.normal(0, 0.2, name ='a') beta =bi.dist.normal(0, 0.5, name ='beta') eta =bi.dist.normal(0, 0.5, name ='eta') s =bi.dist.exponential(1, name ='s') mu = a + beta * A + eta * M D_true =bi.dist.normal(mu, s, name ='D_true') bi.dist.normal(D_true , D_sd, obs = D_obs) }# Run MCMC ------------------------------------------------m$fit(model) # Optimize model parameters through MCMC sampling# Summary ------------------------------------------------m$summary() # Get posterior distribution
usingBayesianInference# Setup device------------------------------------------------m =importBI(platform="cpu")# Import Data & Data Manipulation ------------------------------------------------# Importdata_path = m.load.WaffleDivorce(only_path=true)m.data(data_path, sep=";") m.scale(["MedianAgeMarriage", "Marriage"]) # Scaledat =pydict( D_obs = m.z_score(m.df["Divorce"].values), D_sd = jnp.array(m.df["Divorce SE"].values / m.df["Divorce"].std()), A = jnp.array(m.df["MedianAgeMarriage"].values), M = jnp.array(m.df["Marriage"].values), N = m.df.shape[0] )m.data_on_model = dat # Send to model (convert to jax array)# Define model ------------------------------------------------@BIfunctionmodel(D_obs, D_sd, A, N, M) a = m.dist.normal(0, 0.2, name ="a") beta = m.dist.normal(0, 0.5, name ="beta") eta = m.dist.normal(0, 0.5, name ="eta") s = m.dist.exponential(1, name ="s") mu = a + beta * A + eta * M D_true = m.dist.normal(mu, s, name ="D_true") m.dist.normal(D_true , D_sd, obs = D_obs) end# Run mcmc ------------------------------------------------m.fit(model) # Optimize model parameters through MCMC sampling# Summary ------------------------------------------------m.summary() # Get posterior distributions
Mathematical Details
Bayesian formulation
D_i^* \sim \text{Normal}(D_i, \varsigma_i)
D_i \sim \text{Normal}(\mu_i, \sigma)
\mu_i = \alpha + \beta A_i + \eta M_i
\sigma \sim \text{Normal}(1)
where:
D_i^* is the observed divorce rate.
D_i is the true divorce rate.
\mu_i is the mean of the true divorce rate.
\sigma is the standard deviation of the true divorce rate.
\alpha is the intercept term.
\beta is the regression coefficient for age.
\eta is the regression coefficient for marriage rate.
Notes
Note
This is an approach that can be extended to any kind of model previously described. For example, one could generate a Bernoulli measurement error model by generating a process for the probabilities of success and failure. We can even go further by potentially having an error rate that is present only in one of the two outcomes.
Reference(s)
McElreath, Richard. 2018. Statistical Rethinking: A Bayesian course with examples in R and Stan. Chapman; Hall/CRC.